Node Sampling Using Random Centrifugal Walks
نویسندگان
چکیده
Sampling a network with a given probability distribution has been identified as a useful operation. In this paper we propose distributed algorithms for sampling networks, so that nodes are selected by a special node, called the source, with a given probability distribution. All these algorithms are based on a new class of random walks, that we call Random Centrifugal Walks (RCW). A RCW is a random walk that starts at the source and always moves away from it. Firstly, an algorithm to sample any connected network using RCW is proposed. The algorithm assumes that each node has a weight, so that the sampling process must select a node with a probability proportional to its weight. This algorithm requires a preprocessing phase before the sampling of nodes. In particular, a minimum diameter spanning tree (MDST) is created in the network, and then nodes’ weights are efficiently aggregated using the tree. The good news are that the preprocessing is done only once, regardless of the number of sources and the number of samples taken from the network. After that, every sample is done with a RCW whose length is bounded by the network diameter. Secondly, RCW algorithms that do not require preprocessing are proposed for grids and networks with regular concentric connectivity, for the case when the probability of selecting a node is a function of its distance to the source. The key features of the RCW algorithms (unlike previous Markovian approaches) are that (1) they do not need to warm-up (stabilize), (2) the sampling always finishes in a number of hops bounded by the network diameter, and (3) it selects a node with the exact probability distribution.
منابع مشابه
Brief Announcement: Node Sampling Using Centrifugal Random Walks
We propose distributed algorithms for sampling networks based on a new class of random walks that we call Centrifugal Random Walks (CRW). A CRW is a random walk that starts at a source and always moves away from it. We propose CRW algorithms for connected networks with arbitrary probability distributions, and for grids and networks with regular concentric connectivity with distance based distri...
متن کاملWalk, Not Wait: Faster Sampling Over Online Social Networks
In this paper, we introduce a novel, general purpose, technique for faster sampling of nodes over an online social network. Specifically, unlike traditional random walks which wait for the convergence of sampling distribution to a predetermined target distribution a waiting process that incurs a high query cost we develop WALK-ESTIMATE, which starts with a much shorter random walk, and then pro...
متن کاملBiased random walks on multiplex networks
Biased random walks on complex networks are a particular type of walks whose motion is biased on properties of the destination node, such as its degree. In recent years they have been exploited to design efficient strategies to explore a network, for instance by constructing maximally mixing trajectories or by sampling homogeneously the nodes. In multiplex networks, the nodes are related throug...
متن کاملLeveraging History for Faster Sampling of Online Social Networks
With a vast amount of data available on online social networks, how to enable efficient analytics has been an increasingly important research problem. Many existing studies resort to sampling techniques that draw random nodes from an online social network through its restrictive web/API interface. While almost all of these techniques use the exact same underlying technique of random walk a Mark...
متن کاملOnline Estimating The Top k Nodes Of A Network
The goal of this paper is to estimate the top k central nodes in a network through parsimonious sampling, in an online fashion. We consider three centrality metrics: degree, betweenness, and closeness centrality. We identify and investigate through simulations the contributions of two sources of error in finding central nodes: (1) sampling (collection) error and (2) identification error. Sampli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012